In terms of the the different possible types of outsourced groups2, the numbers are as follows:
Definitely outsourced: 11%
Likely agency: 3%
High indicators: 3%
Characteristics of outsourced workers
Region
The plot below shows the proportion of workers within each region who are outsourced.3
Below we map the workforce composition in each region. The first map emphasises that London has the highest concentration of outsourced workers (25%).
The second map excludes London so that is easier to see how the remaining regions compare. After London, the regions with the highest proportion of outsourced workers are:
East Midlands (19%)
West Midlands (18%)
Wales (18%)
North West (17%)
Northern Ireland (16%)
We can also explore how the the entire UK workforce is distributed across the country.4 The table and map below show the percentage of outsourced workers in each region as a proportion of the total UK workforce. They show where the UK’s outsourced workforce is concentrated. The regions with the highest share of the UK’s outsourced workforce are:
London (21%)
North West (11%)
South East (11%)
West Midlands (9%)
East Midlands (8%)
Region
Frequency
Sum
Percentage
London
357.35
1708.36
20.92
North West
189.39
1708.36
11.09
South East
188.47
1708.36
11.03
West Midlands
161.49
1708.36
9.45
East Midlands
140.50
1708.36
8.22
Scotland
125.82
1708.36
7.37
East of England
125.49
1708.36
7.35
South West
120.50
1708.36
7.05
Yorkshire and the Humber
119.46
1708.36
6.99
Wales
83.25
1708.36
4.87
North East
53.06
1708.36
3.11
Northern Ireland
43.56
1708.36
2.55
Sectors
Here we explore what proportion of workers in each sector are outsourced.5
The plot below shows the proportion of outsourced and not outsourced workers within each sector. I.e. this is showing what sectors have higher and lower proportions of outsourced workers.
The table below shows the percentage of outsourced workers in each Sector, ordered descending by percentage. It shows that the top three Sectors with the highest proportion of outsourced workers are:
ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US (note that N = 31)
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES
Note that for an undefined sector (‘Not found’) contained one of the largest proportions of outsourced workers (31% of workers in the ‘Not found’ category were outsourced).
A key takeaway here is that whereas the total outsourced population is 17%, this figure varies by sector, from 0% for Mining… and Extraterritoral organisations… all the way to 36% for Activities of households as employers, with 5 out 20 sectors having at least 20% of their workforce outsourced.
Gender
# weights: 12 (6 variable)
initial value 14077.819237
iter 10 value 7610.573378
iter 20 value 7465.550476
final value 7465.517316
converged
The outsourced workforce consists of a greater proportion of males than the non-outsourced workforce.6 Men make up 56% of the outsourced workforce compared to 47% of the non-outsourced workforce. This difference is statistically significant; outsourced workers, compared to non-outsourced workers, are 1.44 times more likely to be male than female.7
# weights: 20 (12 variable)
initial value 14077.819237
iter 10 value 7977.307669
iter 20 value 7461.899083
iter 30 value 7457.852026
iter 40 value 7457.374598
final value 7457.362521
converged
Breaking down by outsourcing group, we find that the group with the largest proportion of men in the workforce is the ‘high indicators’ group (66.35%), followed by the ‘likely agency’ group (56.66%), followed by the ‘outsourced’ group (53.94%). Statistically speaking, compared to a not outsourced person,
Someone in the high indicators group is 2.18 times more likely to be male than female.
Someone in the likely agency group is 1.45 times more likely tobe male than female.
Someone in the outsourced group is 1.31 times more likely tobe male than female.
Additionally, people identifying as ‘Other’ gender are absent from the high indicators and likely agency groups, though given the small N (14) for this group, this finding is unlikely to be meaningful.
Pay
Note
Note, the total sample on which income analysis is based is 8943.
The number of income data points for the outsourced group is 1512
The number of income data points for the not outsourced group is 7431
The table and plot below show descriptive statistics on income and its distribution for outsourced and non-outsourced people. Regression analysis shows that outsourced workers are on average paid £2170 less than non-outsourced workers.8
Outsourcing group
n
Mean
Median
Min
Max
Standard dev.
Not outsourced
6924
26781.29
25120.67
2000
66250
13365.63
Outsourced
1367
24611.38
23061.99
2400
66108
12998.56
This difference increases to £2943 when we take into account Age, Gender, Education, Ethnicity, Region, and Arrival Time. 9 This analysis shows that all other variables, apart from Age, are in some way relevant to income. On average, and controlling for each of the otehr variables in the model:
Men earn £7021 more than women.
People who have a degree earn £8198 more than people without a degree.
Workers in all non-London regions earn less than workers in London
East Midlands: -£5755
East of England: -£4060
North East: -£4813
North West: -£4451
Northern Ireland: -£6647
Scotland: -£5428
South East: -£3381
Wales: -£5345
West Midlands: -£4981
Yorkshire and the Humber: -£5489
People who arrived in the UK within the last year earn £6262 less than people born in the UK
People who arrived in the UK within the last 3 years earn £2561 less than people born in the UK
People who arrived in the UK within the last 5 years earn £2306 less than people born in the UK
People who arrived within the last 30 years earn £3292 more than people born in the UK.
Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £5800.82 less than males. For outsourced workers, females are paid £6399.5 less than males. The difference between non-outsourced and outsourced workers is not significant.
The gender by outsourcing status is also not relevant for whether a worker is low income (i.e. non-sig relationship with income_group).
Notable takeaways:
There is a substantial gender pay gap present in the data. The pay gap is the same whether or not people are outsourced.
The South East is the highest-paid region after London. Northern Ireland is the lowest paid region.
People who have very recently arrived in the UK are paid less than people who were born in the UK, whilst people who migrated to the UK a long time ago earn more than people born in the UK.
Next we explore differences by outsourcing group. The table and plot below show descriptive statistics on income and its distribution for outsourced groups. Regression analysis shows that outsourced workers are on average paid £3100 less than non-outsourced workers, while no differences are evident for the likely agency and high indicators groups.12
Outsourcing group
n
Mean
Median
Min
Max
Standard dev.
Not outsourced
6924
26781.29
25120.67
2000.0
66250.00
13365.63
Outsourced
897
23680.86
22165.73
2400.0
66000.00
12783.87
Likely agency
231
25081.11
22800.00
3194.7
65846.67
13702.90
High indicators
239
27921.52
25860.36
4644.0
65000.00
12629.15
However, when controlling, as before, for Age, Gender, Ethnicity, Arrival Time, and Region,13 we find
the outsourced group on average earns £3813 less than the non-outsourced group, and
the likely agency group on average earns £2603 less than the non-outsourced group
In addition to showing that likely agency workers receive lower pay than the non-outsourced workers, this analysis reveals that “pure outsourced” workers’ pay is even lower, and that the estimate we obtained in the analysis above considering only status is a diluted effect averaging the outsourced and likely agency pay gaps.
Variations in pay
Exploring this by type of outsourced worker shows that for all sectors, the majority of outsourced workers fall into the ‘outsourced’ group.14
The next most common group after ‘outsourced’ varies by sector. Many sectors have an almost even split of likely agency and high indicator groups. Sectors that are notable for having quite large likely agency proportions relative to high indicator propottions are:
Construction
Accommodation and food service activities
Activities of households as employers (note N = 32)
In contrast, sectors with high proportion ‘high indicators’ relative to likely agency are:
Other service activities
Professional, scientific and technical activities
Real estate activities
Variations in pay
Ethnicity
People from an ethnic minority are 1.88 times more likely to be outsourced than people from a White British background; 26.72% of outsourced workers are from an ethnic minority, compared to 16.26% of non-outsourced workers.15
Comparison of ethnicities indicates that some groups are statistically more likely to be outsourced than others16:
Asian workers are 1.943 times more likely than White workers to be outsourced.
Black workers are 2.287 times more likely than White workers to be outsourced.
Mixed Ethnicity workers are 1.828 times more likely than White workers to be outsourced.
Arab workers are 3.319 times more likely than White workers to be outsourced.
# weights: 32 (21 variable)
initial value 14077.819237
iter 10 value 6008.626167
iter 20 value 5990.472988
final value 5990.361252
converged
Breaking down by outsourcing group helps to separate out the type of outsourced work people from the ethnicities identified above engage in.17 Compared to White British workers,
Arab people are more likely to be likely agency or outsourced
Asian people are more likely to be in any of the groups
Black people are more likely to be likely agency or outsourced
People of mixed ethnicity are more likely to be outsourced
People who selected Other ethnicity are more likely to be agency
Arrival in the UK
As for non-outsourced workers, the vast majority of outsourced workers are born in the UK. However, people not born in the UK are more likely to be outsourced than people born in the UK. 24.13% of outsourced workers are not born in the UK, compared to 14.08% of non-outsourced workers.18 This difference is statistically significant; outsourced workers are 1.94 times more likely to have been born outside the UK than non-outsourced workers.19
Note
This variable is worded a little strangely, e.g. responses are things like “within the last 10 years”, “within the last 15 years”. Given that respondents only give one answer to this question, I think we can assume that the responses are basically brackets. That is, someone responding “within the last 15 years” is basically saying “I came to the UK between 11 and 15 years ago”.
Looking at the figure below, compared to non-outsourced people, there is a larger proportion of outsourced workers for each arrival time apart from ‘Within the last 30 years’.
Note
Note that all figures here should be interpreted as e.g. “75% of outsourced workers were born in the UK; 87% of non-outsourced workers were born in the UK”
# weights: 12 (6 variable)
initial value 14077.819237
iter 10 value 6002.136126
final value 6002.013178
converged
Exploring types of outsourced work indicates that the pattern observed above applies evenly to the different outsourcing groups.21 Compared to people born in the UK, people not born in the UK are:
1.97 times more likely to be outsourced than non-outsourced
1.82 times more likely to be likely agency than non-outsourced
1.93 times more likely to be high indicators than non-outsourced
The figure below indicates that the proportion of workers of each outsourcing group within each arrival time are broadly similar.
Exploring the intersection of ethnicity and arrival time reveals some patterns whereby the likelihood of a person being outsourced is related to the combinations of ethnicity and whether they were born in the UK.23 The plot below shows that
Among workers born in the UK, a Black worker is 2.01 times more likely to be outsourced than a White worker.
Among workers born in the UK, a Asian worker is 2.02 times more likely to be outsourced than a White worker.
Among workers born in the UK, a Other ethnicity worker is 4.3 times more likely to be outsourced than a White other worker.
Among White workers, someone not born in the UK is 1.82 times more likely to be outsourced than someone born in the UK.
Among Mixed workers, someone not born in the UK is 2.73 times more likely to be outsourced than someone born in the UK.
Among Other workers, someone not born in the UK is 0.13 times as likely (i.e.,87% less likely) to be outsourced than someone born in the UK.
Put differently, being born in the UK is relevant in predicting outsourcing status only for White, Mixed, and Other ethnicities. For the remaining ethnicities, it doesn’t matter whether you are born in the UK or not. And compared to a White person born in the UK, Black and Asian workers are more likely to be outsourced whether or not they were born in the UK.
In summary, people born in the UK are more likely to be outsourced if they are Black, Asian, or ‘Other’, compared to White, and White and mixed ethnicities are more likely to be outsourced if they are not born in the UK, whereas ‘other’ ethnicities are less likely to be outsourced if they were not born in the uk.
We next explore arrival time by collapsing responses to the arrival time question into fewer categories as below
Someone who came to the UK recently is 6.24 times more likely to be outsourced than someone born in the UK.
Someone who came to the UK not recently is 1.73 times more likely to be outsourced than someone born in the UK.
Someone who preferred to not say when they arrived is times more likely to be outsourced than someone born in the UK.
Among Asian workers
Someone who came to the UK not recently is times more likely to be outsourced than someone born in the UK.
Someone who came to the UK not recently is times more likely to be outsourced than someone who came to the UK recently
Among Other workers
Someone who came to the UK not recently is times more likely to be outsourced than someone born in the UK.
In summary,
White outsourced workers are more likely to have not been born in the UK
Asian/Asian British and Other outsourced workers are more likely to have been in the UK a longer time (10 years plus)
UK-born Black and Asian workers are more likely to be outsourced than White UK-born workers, but no more or less likely to be outsourced than non-UK born Black and Asian workers (revise this)
Characteristics of outsourced work
Major occupations
Variations in pay
For Elementary occupations, there is a clear divergence evident in the pattern; for high income workers, being outsourced increases average income, whereas for low income workers, being outsourced decreases average income. For most other groups, being outsourced is associated with a lower income, regardless of income group.
Variations in pay
Unit occupations
Examining what unit occupations outsourced workers can be found in reveals that outsourced workers tend to be concentrated in a specific cluster of occupations.27 42% of outsourced workers are located in the top 10 most common unit occupations. The top 15 unit occupations capture over 50% of the outsourced workforce, and 76% of the outsourced workforce are captured in 30 unit occupations (out of a total of 96). These thresholds are shown in the plot below where the blue lines intersect the red curve.
The top 10 unit occupations for outsourced workers are:
Functional Managers and Directors
Sales Assistants and Retail Cashiers
Caring Personal Services
Other Administrative Occupations
Information Technology Professionals
Elementary Cleaning Occupations
Teaching Professionals
Other Elementary Services Occupations
Road Transport Drivers
Nursing Professionals
These occupations differ in the extent to which outsourced workers are low paid.28 The 5 occupations with the highest proportion of low paid outsourced workers are:
---title: "Key findings final"author: - Jolyon Miles-Wilson - Celestin Okorojidate: "`r format(Sys.time(), '%e %B %Y')`"format: html: self-contained: true code-fold: true code-tools: true code-summary: "Code for Nerds" toc: true toc-depth: 5editor: visualexecute: echo: false warning: false---```{r packages}library(haven)library(poLCA)library(Hmisc)library(dplyr)library(ggplot2)library(tidyr)library(skimr)library(kableExtra)#library(MASS)library(wesanderson)library(ggrepel)library(here)library(emmeans)#library(devtools)#install_version("sjstats", version = "0.18.2")library(sjstats)library(readr)library(sjPlot)library(nnet)``````{r palette}rm(list = ls())options(scipen = 999)colours <- wes_palette("GrandBudapest2",4,"discrete")better_colours <- c('#8dd3c7','#bebada','#fb8072','#80b1d3','#fdb462')many_colours <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a','#ffff99','#b15928','#8dd3c7','#ffffb3','#bebada','#fb8072','#80b1d3','#fdb462','#b3de69','#fccde5','#d9d9d9','#bc80bd','#ccebc5','#ffed6f')``````{r functions}extract_glm_coefs <- function(mod, only_sig=F, decimal_places = 3){ coefs <- coef(summary(mod)) if(only_sig==T){ coefs <- coefs[which(coefs[,4] < .05),] } coefs <- as_tibble(coefs, rownames="variable") %>% # specify new variable to add rownames to mutate( or = round(exp(Estimate), decimal_places), .after=Estimate )}extract_lm_coefs <- function(mod, only_sig = F){ coefs <- coef(summary(mod)) if(only_sig==T){ coefs <- coefs[which(coefs[,4] < .05),] } coefs <- as_tibble(coefs, rownames="variable") # specify new variable to add rownames to }``````{r data, output=FALSE}#data <- haven::read_sav("../Data/2024-04-25 - Cleaned_Data.sav")data <- readRDS("../Data/2024-09-30 - Cleaned_Data.rds") data <- data %>% mutate( Ethnicity_collapsed = case_when( # Grouping White ethnicities Ethnicity %in% c(1,2,5 ) ~ "White/White British", # Grouping Asian ethnicities Ethnicity %in% c(10,11,12,13,14) ~ "Asian/Asian British", # Grouping Black ethnicities Ethnicity %in% c(15,16,17) ~ "Black/African/Caribbean/Black British", # Grouping Mixed ethnicities Ethnicity %in% c(6,7,8,9) ~ "Mixed/Multiple ethnic group", # Grouping Other ethnicities Ethnicity %in% c(18) ~ "Arab/British Arab", # Handling missing or ambiguous categories Ethnicity %in% c(3,4,19) ~ "Other ethnic group", #prefer not to say Ethnicity %in% c(20,21) ~ "Prefer not to say", # Default case for any unmatched entries TRUE ~ "Prefer not to say" ) )#make white the reference categorydata$Ethnicity_collapsed <- relevel(factor(data$Ethnicity_collapsed), ref = "White/White British")data <- data %>% mutate( Has_Degree = factor(Has_Degree, levels = c("No", "Yes", "Don't know")) )```# How many people are outsourced?```{r sum-outsourced}total_outsourced <- data %>% group_by(outsourcing_status) %>% summarise( Sum = sum(NatRepemployees) ) %>% mutate( Proportion = Sum / sum(Sum), Percentage = 100 * Proportion )readr::write_csv(total_outsourced, file="../outputs/data/total_outsourced.csv")# Create function to find nearest denominator to express as a fraction.f <- function(x) ifelse(abs(1/floor(1/x) - x) < abs(1/ceiling(1/x) - x),floor(1/x),ceiling(1/x))```**1 in `r f(total_outsourced$Proportion[which(total_outsourced$outsourcing_status=="Outsourced")])` (`r round(total_outsourced$Percentage[which(total_outsourced$outsourcing_status=="Outsourced")], 0)`%) of UK workers are outsourced.**[^1][^1]: [outputs/data/total_outsourced.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/total_outsourced.csv)```{r sum-outsourcing-group}total_outsourced <- data %>% group_by(outsourcing_group) %>% summarise( Sum = sum(NatRepemployees) ) %>% mutate( Proportion = Sum / sum(Sum), Percentage = 100 * Proportion )readr::write_csv(total_outsourced, file="../outputs/data/total_outsourced_2.csv")```In terms of the the different possible types of outsourced groups[^2], the numbers are as follows:[^2]: [outputs/data/total_outsourced_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/total_outsourced_2.csv)1. Definitely outsourced: `r round(total_outsourced$Percentage[which(total_outsourced$outsourcing_group=="Outsourced")], 0)`%2. Likely agency: `r round(total_outsourced$Percentage[which(total_outsourced$outsourcing_group=="Likely agency")], 0)`%3. High indicators: `r round(total_outsourced$Percentage[which(total_outsourced$outsourcing_group=="High indicators")], 0)`%# Characteristics of outsourced workers## RegionThe plot below shows the proportion of workers within each region who are outsourced.[^3][^3]: [outputs/data/region_stats_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv)```{r}region_statistics_2 <- data %>%# get values of labels# mutate_all(haven::as_factor) %>%group_by(Region, outsourcing_status) %>%summarise(Frequency =sum(NatRepemployees),n =n(), ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) %>%rename(`Outsourcing status`= outsourcing_status ) %>%ungroup()reg_levels <- region_statistics_2 %>%filter(`Outsourcing status`=="Outsourced") %>%mutate(Region = forcats::fct_reorder(Region, Percentage, .desc=FALSE) )annotation_df <- region_statistics_2 %>%filter(`Outsourcing status`=="Not outsourced") %>%select(Region, N) %>%mutate(ypos =100 )region_statistics_2 %>%mutate(Region =factor(Region, levels =levels(reg_levels$Region)) ) %>%ggplot(., aes(Region, Percentage, fill =`Outsourcing status`)) +geom_col(colour="black") +geom_text(inherit.aes=F, data = annotation_df, aes(Region, ypos, label =paste0("N=",N)), hjust=1, nudge_y =-2) +coord_flip() +scale_fill_manual(values=many_colours) +theme_minimal()readr::write_csv(region_statistics_2, file ="../outputs/data/region_stats_2.csv")region_statistics_2_1 <- region_statistics_2 %>%filter(`Outsourcing status`=="Outsourced"& Region !="London")london_perc <- region_statistics_2[which(region_statistics_2$Region =="London"& region_statistics_2["Outsourcing status"] =="Outsourced"), "Percentage"]```Below we map the workforce composition in each region. The first map emphasises that London has the highest concentration of outsourced workers (`r round(region_statistics_2[which(region_statistics_2$Region == "London" & region_statistics_2["Outsourcing status"] == "Outsourced"), "Percentage"],0)`%).```{r}knitr::include_graphics('../outputs/figures/outsourcing_by_region.svg')```The second map excludes London so that is easier to see how the remaining regions compare. After London, the regions with the highest proportion of outsourced workers are:1. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 1), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 1), "Percentage"],0)`%)2. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 2), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 2), "Percentage"],0)`%)3. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 3), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 3), "Percentage"],0)`%)4. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 4), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 4), "Percentage"],0)`%)5. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 5), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 5), "Percentage"],0)`%)```{r}knitr::include_graphics('../outputs/figures/outsourcing_by_region_excl_london.svg')``````{r}region_statistics_3 <- data %>%filter(outsourcing_status =="Outsourced") %>%# get values of labels# mutate_all(haven::as_factor) %>%group_by(Region) %>%summarise(Frequency =sum(NatRepemployees) ) %>%mutate(Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(region_statistics_3, file ="../outputs/data/region_stats_3.csv")```We can also explore how the the entire UK workforce is distributed across the country.[^4] The table and map below show the percentage of outsourced workers in each region as a proportion of the total UK workforce. They show where the UK's outsourced workforce is concentrated. The regions with the highest share of the UK's outsourced workforce are:[^4]: [outputs/data/region_stats_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_3.csv)1. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 1), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 1), "Percentage"],0)`%)2. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 2), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 2), "Percentage"],0)`%)3. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 3), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 3), "Percentage"],0)`%)4. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 4), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 4), "Percentage"],0)`%)5. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 5), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 5), "Percentage"],0)`%)```{r}region_statistics_3 %>%mutate(Region = haven::as_factor(Region) ) %>%arrange(desc(Percentage)) %>% knitr::kable(.,digits =2) %>%kable_styling(full_width = F)``````{r}knitr::include_graphics('../outputs/figures/outsourcing_distribution_across_regions.svg')```## SectorsHere we explore what proportion of workers in each sector are outsourced.[^5][^5]: [outputs/data/sector_summary_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_3.csv)```{r sector-summary-3}sector_summary_3 <- data %>% #filter(income_drop_all == 0) %>% group_by(SectorName, SectorName_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), # avg_income = mean(income_annual_all, na.rm=T), # wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_3, file="../outputs/data/sector_summary_3.csv")```The plot below shows the proportion of outsourced and not outsourced workers within each sector. I.e. this is showing what sectors have higher and lower proportions of outsourced workers.```{r sector-plot-2}plot_data <- sector_summary_3 %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc = TRUE))outsourced <- plot_data %>% filter(outsourcing_status == 'Outsourced') %>% mutate( rank = rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), )# annotation_df <- plot_data %>%# select(SectorName_short, outsourcing_status, perc, n# mutate(annotation_df <- plot_data %>% filter(outsourcing_status == "Not outsourced") %>% select(SectorName_short, N) %>% mutate( ypos = 80 )ggplot(plot_data, aes(SectorName_short, perc, fill = outsourcing_status)) + geom_col() + geom_text(inherit.aes=F,data=annotation_df, aes(x=SectorName_short, y=ypos, label = paste0("N = ", N)), hjust=1, nudge_y = 15) + coord_flip() + scale_fill_manual(values=many_colours) + scale_y_continuous(breaks=seq(0,100,10))# sector_key <- data.frame("number" = seq(1,length(unique(plot_data$SectorName_labelled)),1),# "Sector" = levels(plot_data$SectorName_labelled))# # sector_key %>%# kable() %>%# kable_styling(full_width = F)```The table below shows the percentage of outsourced workers in each Sector, ordered descending by percentage. It shows that the top three Sectors with the highest proportion of outsourced workers are:- `r unique(plot_data$SectorName_labelled[plot_data$SectorName==3])` (note that N = 31)- `r unique(plot_data$SectorName_labelled[plot_data$SectorName==4])`- `r unique(plot_data$SectorName_labelled[plot_data$SectorName==22])`Note that for an undefined sector ('Not found') contained one of the largest proportions of outsourced workers (`r round(plot_data$perc[which(plot_data$SectorName==16 & plot_data$outsourcing_status=="Outsourced")],0)`% of workers in the 'Not found' category were outsourced).A key takeaway here is that whereas the total outsourced population is 17%, this figure varies by sector, from 0% for Mining... and Extraterritoral organisations... all the way to `r round(outsourced[which(outsourced$rank==1),'perc'],0)`% for `r outsourced[which(outsourced$rank==1),'SectorName_short']`, with 5 out 20 sectors having at least 20% of their workforce outsourced.## Gender```{r}gender_statistics <- data %>%group_by(outsourcing_status, Gender) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(gender_statistics, file="../outputs/data/gender_statistics.csv")``````{r gender-outsourcing-status}mod <- multinom(Gender ~ outsourcing_status, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <- summary(mod)$coefficientsors <- exp(coefs)colnames(ors) <- paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1 - pnorm(abs(z), 0, 1)) * 2colnames(p) <- paste(colnames(p), "p", sep="_")p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))sig_ors <- exp(summary(mod)$coefficients * p_2)coefs <- cbind(coefs, ors, p) %>% as_tibble()write_csv(coefs, file = "../outputs/data/gender_inferential_tab.csv")```The outsourced workforce consists of a greater proportion of males than the non-outsourced workforce.[^6] Men make up `r round(gender_statistics[which(gender_statistics$outsourcing_status == "Outsourced" & gender_statistics$Gender == "Male"),"Percentage"], 0)`% of the outsourced workforce compared to `r round(gender_statistics[which(gender_statistics$outsourcing_status == "Not outsourced" & gender_statistics$Gender == "Male"),"Percentage"], 0)`% of the non-outsourced workforce. This difference is statistically significant; outsourced workers, compared to non-outsourced workers, are `r round(sig_ors['Male', 'outsourcing_statusOutsourced'], 2)` times more likely to be male than female.[^7][^6]: [outputs/data/sector_summary_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_3.csv)[^7]: [../outputs/data/gender_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/gender_inferential_tab.csv)```{r}# gender_statistics %>%# kable() %>%# kable_styling(full_width = F)gender_statistics %>%ggplot(., aes(outsourcing_status, Percentage, fill = Gender)) +geom_col(colour="black") +# annotate("text", x = gender_statistics$outsourcing_status, y = 75, label = paste0("n=", gender_statistics$Frequency)) +coord_flip() +scale_fill_manual(values=colours) +theme_minimal() +xlab("Outsourcing group") +annotate("text", x = gender_statistics$outsourcing_status, y =99, label =paste0("N = ", gender_statistics$N), hjust=1) ``````{r}gender_statistics_2 <- data %>%group_by(outsourcing_group, Gender) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(gender_statistics_2, file="../outputs/data/gender_statistics_2.csv")``````{r gender-outsourcing-group}mod <- multinom(Gender ~ outsourcing_group, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <- summary(mod)$coefficientsors <- exp(coefs)colnames(ors) <- paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1 - pnorm(abs(z), 0, 1)) * 2colnames(p) <- paste(colnames(p), "p", sep="_")p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))sig_ors <- exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs <- cbind(coefs, ors, p) %>% as_tibble()write_csv(coefs, file = "../outputs/data/gender_inferential_tab_2.csv")```Breaking down by outsourcing group, we find that the group with the largest proportion of men in the workforce is the 'high indicators' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="High indicators" & Gender == "Male") %>% pull(Percentage), 2)`%), followed by the 'likely agency' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="Likely agency" & Gender == "Male") %>% pull(Percentage), 2)`%), followed by the 'outsourced' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="Outsourced" & Gender == "Male") %>% pull(Percentage), 2)`%). Statistically speaking, compared to a not outsourced person,- Someone in the high indicators group is `r round(sig_ors['Male', 'outsourcing_groupHigh indicators'],2)` times more likely to be male than female.- Someone in the likely agency group is `r round(sig_ors['Male', 'outsourcing_groupLikely agency'],2)` times more likely tobe male than female.- Someone in the outsourced group is `r round(sig_ors['Male', 'outsourcing_groupOutsourced'],2)` times more likely tobe male than female.Additionally, people identifying as 'Other' gender are absent from the high indicators and likely agency groups, though given the small N (`r sum(data$Gender=="Other")`) for this group, this finding is unlikely to be meaningful.```{r}# gender_statistics_2 %>%# kable() %>%# kable_styling(full_width = F)gender_statistics_2 %>%ggplot(., aes(outsourcing_group, Percentage, fill = Gender)) +geom_col(colour="black") +# annotate("text", x = gender_statistics$outsourcing_status, y = 75, label = paste0("n=", gender_statistics$Frequency)) +coord_flip() +scale_fill_manual(values=colours) +theme_minimal() +xlab("Outsourcing group") +annotate("text", x = gender_statistics_2$outsourcing_group, y =99, label =paste0("N = ", gender_statistics_2$N), hjust=1) ```## Pay::: callout-noteNote, the total sample on which income analysis is based is `r sum(!is.na(data$income_annual_all))`.The number of income data points for the outsourced group is `r data %>% filter(outsourcing_status=="Outsourced") %>% summarise(sum(!is.na(income_annual_all))) %>% pull()`The number of income data points for the not outsourced group is `r data %>% filter(outsourcing_status=="Not outsourced") %>% summarise(sum(!is.na(income_annual_all))) %>% pull()`:::```{r income}# filter to just cases where income is abovve the fifth percentile and lower than the 95th? I.e., drop the top and bottom 5%.income_statistics <- data %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% group_by(outsourcing_status) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )readr::write_csv(income_statistics, file="../outputs/data/income_stats_o-status.csv")income_data <- filter(data, income_drop_all==0)mod <- lm(income_annual_all ~ outsourcing_status, income_data, weights = NatRepemployees)# summary(mod)coef_table <- extract_lm_coefs(mod)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_lm_coefs(mod, only_sig = T)write_csv(coef_table, file="../outputs/data/model_income_by_o-status.csv")```The table and plot below show descriptive statistics on income and its distribution for outsourced and non-outsourced people. Regression analysis shows that **outsourced workers are on average paid £`r abs(round(coef_table['outsourcing_statusOutsourced','Estimate'],0))` less than non-outsourced workers**.[^8][^8]: [outputs/data/income_stats_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/income_stats_o-status.csv) & [outputs/data/model_income_by_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-status.csv)```{r income-plot}knitr::kable(income_statistics, digits = 2, col.names = c("Outsourcing group", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F)# plot the distribution of income for the two groupsdata %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% ggplot(., aes(outsourcing_status, income_annual_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics, aes(outsourcing_status, y = 6e+04), label=paste0("Mean = ", round(income_statistics$mean,0),"\n", "Median = ", income_statistics$median), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing status") + ylab("Annual income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics$min), 5000, f = floor),plyr::round_any(max(income_statistics$max),5000, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics$min), 5000, f = ceiling), plyr::round_any(max(income_statistics$max),5000, f = ceiling), 10000))``````{r}#| output: falsemod <-lm(income_annual_all ~ Age + Gender + Ethnicity_collapsed + Region + outsourcing_status, income_data, weights = NatRepemployees)summary(mod)mod_2 <-lm(income_annual_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status, income_data, weights = NatRepemployees)summary(mod_2)mod_3 <-update(mod_2, ~.+ BORNUK_labelled) summary(mod_3)# anova(mod_2, mod_3) # adding BORNUK improves model fitcoef_table <-extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod_3, only_sig = T)write_csv(coef_table, file="../outputs/data/model_2_income_by_o-status.csv")```This difference increases to £`r abs(round(coef_table['outsourcing_statusOutsourced','Estimate'],0))` when we take into account Age, Gender, Education, Ethnicity, Region, and Arrival Time. [^9] This analysis shows that all other variables, apart from Age, are in some way relevant to income. On average, and controlling for each of the otehr variables in the model:[^9]: [outputs/data/model_2_income_by_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_2_income_by_o-status.csv)- Men earn £`r abs(round(coef_table['GenderMale','Estimate'],0))` more than women.- People who have a degree earn £`r abs(round(coef_table['Has_DegreeYes','Estimate'],0))` more than people without a degree.- Workers in all non-London regions earn less than workers in London - East Midlands: -£`r abs(round(coef_table['RegionEast Midlands','Estimate'],0))` - East of England: -£`r abs(round(coef_table['RegionEast of England','Estimate'],0))` - North East: -£`r abs(round(coef_table['RegionNorth East','Estimate'],0))` - North West: -£`r abs(round(coef_table['RegionNorth West','Estimate'],0))` - Northern Ireland: -£`r abs(round(coef_table['RegionNorthern Ireland','Estimate'],0))` - Scotland: -£`r abs(round(coef_table['RegionScotland','Estimate'],0))` - South East: -£`r abs(round(coef_table['RegionSouth East','Estimate'],0))` - Wales: -£`r abs(round(coef_table['RegionWales','Estimate'],0))` - West Midlands: -£`r abs(round(coef_table['RegionWest Midlands','Estimate'],0))` - Yorkshire and the Humber: -£`r abs(round(coef_table['RegionYorkshire and the Humber','Estimate'],0))`- People who arrived in the UK within the last year earn £`r abs(round(coef_table['BORNUK_labelledWithin the last year','Estimate'],0))` less than people born in the UK- People who arrived in the UK within the last 3 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 3 years','Estimate'],0))` less than people born in the UK- People who arrived in the UK within the last 5 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 5 years','Estimate'],0))` less than people born in the UK- People who arrived within the last 30 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 30 years','Estimate'],0))` more than people born in the UK.### Income group[^10][^10]: [../outputs/data/income_group_outsourcing.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/income_group_outsourcing.csv)```{r}#| output: false# test significancemod <-glm(income_group ~ outsourcing_status, data, family="quasibinomial", weights = NatRepemployees)summary(mod)test <-summary(mod)or <-exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]])p <- test[["coefficients"]][2,4]mod_2 <-glm(income_group ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod_2)# test <- summary(mod_2)or <-exp(mod_2[["coefficients"]][["outsourcing_statusOutsourced"]])p <- test[["coefficients"]][2,4]coef_table <-extract_glm_coefs(mod_2)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod_2, only_sig = T)write_csv(coef_table, file="../outputs/data/income_group_outsourcing.csv")```A person is more likely to be in the low income group if they are:- Older- Female- Don't have a degree (or don't know if they have a degree?)- Are outsourced- Arrived in the UK in the last yearAnd less likely if they are:- Younger- Male- Have a degree- Live in the North West or Wales (compared to London)- Arrived in the UK in last 30 years### Gender pay gap^[[outputs/data/gender_outsourced_gap.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/gender_outsourced_gap.csv) & [outputs/data/mod_gender_outsourcing.csv](outputs/data/mod_gender_outsourcing.csv)]```{r gender-pay-gap-1}gender_outsourced_gap <- income_data %>% group_by(outsourcing_status, Gender) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )not_outsourced_gap <- gender_outsourced_gap %>% filter(outsourcing_status == "Not outsourced") %>% select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff)outsourced_gap <- gender_outsourced_gap %>% filter(outsourcing_status == "Outsourced") %>% select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff)write_csv(gender_outsourced_gap, "../outputs/data/gender_outsourced_gap.csv")```Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £`r round(not_outsourced_gap,2)` less than males. For outsourced workers, females are paid £`r round(outsourced_gap,2)` less than males. The difference between non-outsourced and outsourced workers is not significant.```{r gender-outsourcing-int}#| output: falseggplot(gender_outsourced_gap, aes(outsourcing_status, median, fill = Gender)) + geom_col(position="dodge") + geom_label(aes(label=round(median,0)), position=position_dodge(width=0.9)) + theme_minimal() + ylab("Median income") + xlab("Outsourcing status")simp_mod <- lm(income_annual_all ~ Gender*outsourcing_status, income_data, weights = NatRepemployees)summary(simp_mod)# simp_mod2 <- update(simp_mod, ~. + Has_Degree)# summary(simp_mod2)# anova(simp_mod, simp_mod2)mod_2 <- lm(income_annual_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_status, income_data, weights = NatRepemployees)summary(mod_2)mod_3 <- update(mod_2, ~.+ BORNUK_labelled) summary(mod_3)anova(mod_2, mod_3) # adding BORNUK improves model fitcoef_table <- extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, "../outputs/data/mod_gender_outsourcing.csv")```The gender by outsourcing status is also not relevant for whether a worker is low income (i.e. non-sig relationship with income_group).```{r}#| output: falsemod <-glm(income_group ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod)# test <- summary(mod)or <-exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]])p <- test[["coefficients"]][2,4]coef_table <-extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, "../outputs/data/mod_gender_outsourcing_income_group.csv")``````{r}#| output: false# just test for low income grouplow_pay_data <- income_data %>%filter(income_group=="Low")mod_2 <-lm(income_annual_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_status, low_pay_data, weights = NatRepemployees)summary(mod_2)mod_3 <-update(mod_2, ~.+ BORNUK_labelled) summary(mod_3)```Notable takeaways:- There is a substantial gender pay gap present in the data. The pay gap is the same whether or not people are outsourced.- The South East is the highest-paid region after London. Northern Ireland is the lowest paid region.- People who have very recently arrived in the UK are paid less than people who were born in the UK, whilst people who migrated to the UK a long time ago earn more than people born in the UK.```{r}income_statistics <- data %>%filter(income_drop_all ==0&!is.na(income_annual_all)) %>%group_by(outsourcing_group) %>%summarise(n =n(),mean =weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T),median =wtd.quantile(income_annual_all, w = NatRepemployees, probs =c(.5), na.rm = T),min =wtd.quantile(income_annual_all, w = NatRepemployees, probs =c(0), na.rm = T),max =wtd.quantile(income_annual_all, w = NatRepemployees, probs =c(1), na.rm = T),stdev =sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )readr::write_csv(income_statistics, file="../outputs/data/income_stats_o-group.csv")income_data <-filter(data, income_drop_all==0)mod <-lm(income_annual_all ~ outsourcing_group, income_data, weights = NatRepemployees)# summary(mod)coef_table <-extract_lm_coefs(mod)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, file="../outputs/data/model_income_by_o-group.csv")```Next we explore differences by outsourcing group. The table and plot below show descriptive statistics on income and its distribution for outsourced groups. Regression analysis shows that **outsourced workers are on average paid £`r abs(round(coef_table['outsourcing_groupOutsourced','Estimate'],0))` less than non-outsourced workers**, while no differences are evident for the likely agency and high indicators groups.[^11][^11]: [outputs/data/income_stats_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/income_stats_o-group.csv) & [outputs/data/model_income_by_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-group.csv)```{r}knitr::kable(income_statistics, digits =2, col.names =c("Outsourcing group","n","Mean","Median","Min","Max","Standard dev.")) %>%kable_styling(full_width = F)data %>%filter(income_drop_all ==0&!is.na(income_annual_all)) %>%ggplot(., aes(outsourcing_group, income_annual_all)) +geom_violin() +geom_boxplot(width =0.3) +geom_text(inherit.aes=F, data=income_statistics, aes(outsourcing_group, y =6e+04), label=paste0("Mean = ", round(income_statistics$mean,0),"\n", "Median = ", round(income_statistics$median,0),"\n N = ", income_statistics$n), nudge_x =0.1, hjust=0) +coord_cartesian(xlim=c(1,2.5)) +theme_minimal() +xlab("Outsourcing group") +ylab("Annual income") +coord_cartesian(ylim =c(plyr::round_any(min(income_statistics$min), 5000, f = floor),plyr::round_any(max(income_statistics$max),5000, f = ceiling))) +scale_y_continuous(breaks =seq(plyr::round_any(min(income_statistics$min), 5000, f = ceiling), plyr::round_any(max(income_statistics$max),5000, f = ceiling), 10000))``````{r}mod_2 <-lm(income_annual_all ~ Age + Gender + Ethnicity_collapsed + Region + outsourcing_group, income_data, weights = NatRepemployees)# summary(mod_2)mod_3 <-update(mod_2, ~.+ BORNUK_labelled) # summary(mod_3)# anova(mod_2, mod_3) # adding BORNUK improves model fitcoef_table <-extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod_3, only_sig = T)write_csv(coef_table, file="../outputs/data/model_2_income_by_o-group.csv")```However, when controlling, as before, for Age, Gender, Ethnicity, Arrival Time, and Region,[^12] we find[^12]: [outputs/data/model_2_income_by_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_2_income_by_o-group.csv)- the outsourced group on average earns **£`r abs(round(coef_table['outsourcing_groupOutsourced','Estimate'],0))` less** than the non-outsourced group, and- the likely agency group on average earns **£`r abs(round(coef_table['outsourcing_groupLikely agency','Estimate'],0))` less** than the non-outsourced groupIn addition to showing that likely agency workers receive lower pay than the non-outsourced workers, this analysis reveals that "pure outsourced" workers' pay is even lower, and that the estimate we obtained in the analysis above considering only status is a diluted effect averaging the outsourced and likely agency pay gaps.<!-- ## Pay differences --><!-- ```{r} --><!-- mod <- lm(income_annual_all ~ Age + Gender + Ethnicity_collapsed + BORNUK_labelled + Region + SectorName_short, income_data, weights = NatRepemployees) --><!-- summary(mod) --><!-- ``` --><!-- ```{r} --><!-- mod <- lm(income_annual_all ~ SectorName_short*outsourcing_status, income_data, weights = NatRepemployees) --><!-- summary(mod) --><!-- ``` --><!-- ```{r} --><!-- mod <- lm(income_annual_all ~ SectorName_short*outsourcing_group, income_data, weights = NatRepemployees) --><!-- summary(mod) --><!-- ## work out how to just plot certain levels! ## --><!-- sjPlot::plot_model(mod, type = "pred", legend.title="", terms = c("SectorName_short","outsourcing_group"), dodge=1) + --><!-- coord_flip() --><!-- sig_coefs <- extract_lm_coefs(mod, only_sig = T) #, decimal_places = 10)# --><!-- ``` -->### Variations in pay```{r sector-bubble}sector_summary_pay <- data %>% filter(income_drop_all == 0) %>% group_by(SectorName, SectorName_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_annual_all, na.rm=T), wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_pay, file="../outputs/data/sector_summary_pay.csv")plot_data <- sector_summary_pay %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc = TRUE))outsourced <- plot_data %>% filter(outsourcing_status == 'Outsourced') %>% mutate( rank = rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), )annotation_df <- plot_data %>% filter(outsourcing_status == "Not outsourced") %>% select(SectorName_short, N) %>% group_by(SectorName_short) %>% summarise( N = sum(N) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=SectorName_short, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average income") + ylab("Sector") + labs(caption = "Size of bubble represents the size of the respective workforce")``````{r}sector_summary_paysplit <- data %>%filter(income_drop_all ==0) %>%group_by(SectorName, SectorName_labelled, income_group, outsourcing_status) %>%drop_na(income_group) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_annual_all, na.rm=T),wtd_avg_income =weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName, income_group) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_paysplit, file="../outputs/data/sector_summary_paysplit_o-status.csv")``````{r}plot_data <- sector_summary_paysplit %>%drop_na(SectorName_short) %>%droplevels() %>%ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>%filter(outsourcing_status =='Not outsourced') %>%mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc =TRUE))outsourced <- plot_data %>%filter(outsourcing_status =='Outsourced') %>%mutate(rank =rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(SectorName_short =factor(SectorName_short, levels =levels(not_outsourced_levels$SectorName_short)), )annotation_df <- plot_data %>%filter(outsourcing_status =="Not outsourced") %>%select(SectorName_short, N) %>%group_by(SectorName_short) %>%summarise(N =sum(N) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 ) plot_data %>%# mutate(# SectorName = as.factor(SectorName)# ) %>%ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_status, shape = income_group)) +geom_point(position ="dodge") +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank())+#coord_flip() +scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income), 10000)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=SectorName_short, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average income") +ylab("Sector") +labs(caption ="Size of bubble represents the size of the respective workforce")```<!-- To test what sectors are more likely to employ outsourced workers, it is necessary to choose a "reference" sector against which to compare other sectors. A priori there is no theoretical candidate sector that should be the reference. However, we can select a reference based on what we know about the proportion of workers in each sector. One strategy is to choose the sector that has the lowest proportion of outsourced workers. Doing so means that interpretation is something along the lines of --><!-- "compared to the sector that we know has the smallest outsourced workforce, a worker is *x* times more likely to be outsourced if they work in Sector A" --><!-- From the figure above we can see that the sectors with the lowest proportion of outsourced workers are "Mining..." and "Activities of extraterritorial...", which both have zero outsourced workers. A problem with using these is that the sample sizes are very small. The next-lowest is "Agriculture...", but here the sample is quite small too. The next-lowest is "Public administration and defence", which has an outsourced workforce of around **`r round(sector_summary_3 %>% filter(SectorName_short == "Public administration and defence" & outsourcing_status == "Outsourced") %>% pull(perc),0)`%** and a sample size of **`r sector_summary_3 %>% filter(SectorName_short == "Public administration and defence" & outsourcing_status == "Outsourced") %>% pull(N)`**. This is probably the best candidate as a reference, because it has a reliable sample size and offers a low outsourcing baseline against which to compare other sectors. It is also quite neat that this sector is basically civil service, which also distinguishes it from other sectors. -->```{r}# relevel sectorname_shortdata <- data %>%mutate(SectorName_short = forcats::fct_relevel(SectorName_short, "Public administration and defence") )``````{r include=FALSE}mod <- glm(outsourcing_status ~ SectorName_short, data, weights = NatRepemployees, family = "quasibinomial")summary(mod)coef_table <- extract_glm_coefs(mod)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T) #, decimal_places = 10)#``````{r include=FALSE}plot_model <- function(mod){ coefs <- extract_glm_coefs(mod) confints <- confint(mod) vars <- rownames(confints) confints <- confints %>% as_tibble() %>% mutate( variable = vars, .before=everything() ) %>% rename( ci_low = `2.5 %`, ci_upp = `97.5 %` ) %>% mutate( ci_low = exp(ci_low), ci_upp = exp(ci_upp) ) coef_table <- coefs %>% left_join(., confints, by = "variable") %>% filter(`Pr(>|t|)` < .05) max <- ceiling(max(coef_table$ci_upp)) p <- ggplot(coef_table, aes(variable, or)) + geom_point() + geom_errorbar(aes(ymin=ci_low, ymax=ci_upp)) + coord_flip() + geom_hline(yintercept = 1, colour = "red") + scale_y_continuous(breaks = seq(0, max, 1)) return(p)}plot_model(mod)``````{r}sector_summary <- data %>%#filter(income_drop_all == 0) %>%group_by(SectorName, SectorName_labelled, outsourcing_group) %>%summarise(n =n(),Frequency =sum(NatRepemployees),# avg_income = mean(income_annual_all, na.rm=T),# wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary, file ='../outputs/data/sector_summary_o-group.csv')```Exploring this by type of outsourced worker shows that for all sectors, the majority of outsourced workers fall into the 'outsourced' group.[^13][^13]: [outputs/data/sector_summary_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_o-group.csv)```{r}plot_data <- sector_summary %>%drop_na(SectorName_short) %>%droplevels() %>%ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>%filter(outsourcing_group =='Not outsourced') %>%mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc =TRUE))outsourced <- plot_data %>%filter(outsourcing_group =='Outsourced') %>%mutate(rank =rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(SectorName_short =factor(SectorName_short, levels =levels(not_outsourced_levels$SectorName_short)), )# annotation_df <- plot_data %>%# select(SectorName_short, outsourcing_status, perc, n# mutate(annotation_df <- plot_data %>%filter(outsourcing_group =="Not outsourced") %>%select(SectorName_short, N) %>%mutate(ypos =80 )ggplot(plot_data, aes(SectorName_short, perc, fill = outsourcing_group)) +geom_col() +geom_text(inherit.aes=F,data=annotation_df, aes(x=SectorName_short, y=ypos, label =paste0("N = ", N)), hjust=1, nudge_y =15) +coord_flip() +scale_fill_manual(values=many_colours) +scale_y_continuous(breaks=seq(0,100,10))```The next most common group after 'outsourced' varies by sector. Many sectors have an almost even split of likely agency and high indicator groups. Sectors that are notable for having quite large likely agency proportions relative to high indicator propottions are:- Construction- Accommodation and food service activities- Activities of households as employers (note N = 32)In contrast, sectors with high proportion 'high indicators' relative to likely agency are:- Other service activities- Professional, scientific and technical activities- Real estate activities```{r}annotation_df <- plot_data %>%filter(outsourcing_group !="Not outsourced") %>%select(SectorName_short, N) %>%mutate(ypos =20 )plot_data %>%filter(outsourcing_group !="Not outsourced") %>%ggplot(., aes(SectorName_short, perc, fill = outsourcing_group)) +geom_col(position="dodge") +geom_text(inherit.aes=F,data=annotation_df, aes(x=SectorName_short, y=ypos, label =paste0("N = ", N)), hjust=1, nudge_y =15) +coord_flip() +scale_fill_manual(values=many_colours) +labs(caption ="Note: N labels reflect total for sector including not outsourced (not shown here)")# scale_y_continuous(breaks=seq(0,100,10))```### Variations in pay```{r}sector_summary <- data %>%filter(income_drop_all ==0) %>%group_by(SectorName, SectorName_labelled, outsourcing_group) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_annual_all, na.rm=T),wtd_avg_income =weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary, file ='../outputs/data/sector_summary_o-group_pay.csv')plot_data <- sector_summary %>%drop_na(SectorName_short) %>%droplevels() %>%ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>%filter(outsourcing_group =='Not outsourced') %>%mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc =TRUE))outsourced <- plot_data %>%filter(outsourcing_group =='Outsourced') %>%mutate(rank =rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(SectorName_short =factor(SectorName_short, levels =levels(not_outsourced_levels$SectorName_short)), )annotation_df <- plot_data %>%filter(outsourcing_group =="Not outsourced") %>%select(SectorName_short, N) %>%group_by(SectorName_short) %>%summarise(N =sum(N) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 ) plot_data %>%# mutate(# SectorName = as.factor(SectorName)# ) %>%ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_group)) +geom_point(position ="dodge") +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank(),#axis.text.x = element_text(angle=45, hjust=1) )+#coord_flip() +scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F, data=annotation_df, aes(x=ypos, y=SectorName_short, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average income") +ylab("Sector") +labs(caption ="Size of bubble represents the size of the respective workforce")``````{r}sector_summary_paysplit <- data %>%#filter(income_drop_all == 0) %>%drop_na(income_group) %>%group_by(SectorName, SectorName_labelled, income_group, outsourcing_group) %>%summarise(n =n(),Frequency =sum(NatRepemployees),# avg_income = mean(income_annual_all, na.rm=T),# wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_3, file="../outputs/data/sector_summary_paysplit_o-group.csv")``````{r sector-paysplit-ogroup}sector_summary_paysplit <- data %>% filter(income_drop_all == 0) %>% drop_na(income_group) %>% group_by(SectorName, SectorName_labelled, income_group, outsourcing_group) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_annual_all, na.rm=T), wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_3, file="../outputs/data/sector_summary_paysplit_o-group_pay.csv")plot_data <- sector_summary_paysplit %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_group == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc = TRUE))outsourced <- plot_data %>% filter(outsourcing_group == 'Outsourced') %>% mutate( rank = rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), )annotation_df <- plot_data %>% filter(outsourcing_group == "Not outsourced") %>% select(SectorName_short, N) %>% group_by(SectorName_short) %>% summarise( N = sum(N) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_group, shape = income_group)) + geom_point() + theme_minimal() + theme(legend.position = "bottom", legend.justification = "right", legend.title = element_blank(), #plot.margin = unit(c(1,1,1,1), "cm") ) + #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=SectorName_short, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average income") + ylab("Sector") + labs(caption = "Size of bubble represents the size of the respective workforce")```## Ethnicity```{r}ethnicity_statistics <- data %>%group_by(outsourcing_status, Ethnicity_collapsed) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_collapsed ) %>%separate_wider_delim(Ethnicity_short, names =c("Ethnicity_short", "Ethnicity detail"), delim = stringr::regex(" / |, "), # use multiple delimstoo_few ="align_start",too_many ="merge")readr::write_csv(ethnicity_statistics, file ="../outputs/data/ethnicity_stats_1.csv")``````{r ethnicity_inferential, output=FALSE}ethnicities <- as.vector(unique(data$Ethnicity_collapsed))non_white_ethnicities <- ethnicities[!(ethnicities %in% "White/White British")]# Will throw NA warning. I think this OK but investigate how to avoid the problemdata <- data %>% mutate( Ethnicity_binary = forcats::fct_collapse(Ethnicity_collapsed, "White British" = c("White/White British"), "Non-White British" = non_white_ethnicities) )mod <- glm(outsourcing_status ~ Ethnicity_binary, data, weights = NatRepemployees, family="quasibinomial")# mod <- glm(Ethnicity_binary~outsourcing_status , data, weights = NatRepemployees, family="quasibinomial")#summary(mod)coefs <- extract_glm_coefs(mod)write_csv(coefs, file = "../outputs/data/ethnicity_binary_o-status_inferential_tab.csv")```People from an ethnic minority are `r round(coefs[2, 'or'],2)` times more likely to be outsourced than people from a White British background; `r round(100 - ethnicity_statistics[which(ethnicity_statistics$outsourcing_status == "Outsourced" & ethnicity_statistics$Ethnicity_collapsed == "White/White British"), "Percentage"],2)`% of outsourced workers are from an ethnic minority, compared to `r round(100 - ethnicity_statistics[which(ethnicity_statistics$outsourcing_status == "Not outsourced" & ethnicity_statistics$Ethnicity_collapsed == "White/White British"), "Percentage"],2)`% of non-outsourced workers.[^14][^14]: [outputs/data/ethnicity_stats_1.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_stats_1.csv) & [outputs/data/ethnicity_binary_o-status_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_binary_o-status_inferential_tab.csv)```{r}data %>%group_by(outsourcing_status, Ethnicity_binary) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) %>%ggplot(., aes(outsourcing_status, Percentage, fill = Ethnicity_binary)) +geom_col(colour="black") +annotate("text", x = ethnicity_statistics$outsourcing_status, y =99, label =paste0("N = ",ethnicity_statistics$N), hjust=1) +coord_flip() +scale_fill_manual(values = many_colours, name ="Ethnicity") +xlab("Outsourcing group") +theme_minimal()``````{r ethnicity-status}mod <- glm(outsourcing_status ~ Ethnicity_collapsed, data, weights = NatRepemployees, family = "quasibinomial")# summary(mod)coef_table <- extract_glm_coefs(mod)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, file="../outputs/data/ethnicity_model_inferential.csv")```Comparison of ethnicities indicates that some groups are statistically more likely to be outsourced than others[^15]:[^15]: [outputs/data/ethnicity_model_inferential.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_model_inferential.csv)- Asian workers are `r coef_table["Ethnicity_collapsedAsian/Asian British", "or"]` times more likely than White workers to be outsourced.- Black workers are `r coef_table["Ethnicity_collapsedBlack/African/Caribbean/Black British", "or"]` times more likely than White workers to be outsourced.- Mixed Ethnicity workers are `r coef_table["Ethnicity_collapsedMixed/Multiple ethnic group", "or"]` times more likely than White workers to be outsourced.- Arab workers are `r coef_table["Ethnicity_collapsedArab/British Arab", "or"]` times more likely than White workers to be outsourced.```{r ethnicity-group}mod <- multinom(outsourcing_group ~ Ethnicity_collapsed, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <- summary(mod)$coefficients# get predicted group names to insert latergroup <- rownames(coefs)ors <- exp(coefs)colnames(ors) <- paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1 - pnorm(abs(z), 0, 1)) * 2colnames(p) <- paste(colnames(p), "p", sep="_")p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))sig_ors <- exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs2 <- cbind(coefs, ors, p) %>% as_tibble() %>% mutate( predicted_group = group, .before=everything() # insert predicted group so output table can be better interpeted )write_csv(coefs2, file = "../outputs/data/ethnicity_ogroup_inferential_tab.csv")# sig_ors```Breaking down by outsourcing group helps to separate out the *type* of outsourced work people from the ethnicities identified above engage in.[^16] Compared to White British workers,[^16]: [outputs/data/ethnicity_ogroup_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_ogroup_inferential_tab.csv)- Arab people are more likely to be likely agency or outsourced- Asian people are more likely to be in any of the groups- Black people are more likely to be likely agency or outsourced- People of mixed ethnicity are more likely to be outsourced- People who selected Other ethnicity are more likely to be agency```{r}sjPlot::plot_model(mod)```## Arrival in the UK```{r}bornuk_statistics <- data %>%group_by(outsourcing_status, BORNUK_labelled) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(bornuk_statistics, file="../outputs/data/arrival_in_UK_stats.csv")``````{r bornuk_inferential, output=FALSE}categories <- as.vector(unique(data$BORNUK_labelled))non_categories <- categories[!(categories %in% "I was born in the UK")]# Will throw NA warning. I think this OK but investigate how to avoid the problemdata <- data %>% mutate( BORNUK_binary = forcats::fct_collapse(BORNUK_labelled, "Born in UK" = "I was born in the UK", "Not born in UK" = non_categories) ) mod <- glm(outsourcing_status ~ BORNUK_binary, data, weights = NatRepemployees, family="quasibinomial")# mod <- glm(Ethnicity_binary~outsourcing_status , data, weights = NatRepemployees, family="quasibinomial")summary(mod)coefs <- extract_glm_coefs(mod)write_csv(coefs, file = "../outputs/data/bornuk_ostatus_inferential_tab.csv")```As for non-outsourced workers, the vast majority of outsourced workers are born in the UK. However, people not born in the UK are more likely to be outsourced than people born in the UK. `r 100 - round(bornuk_statistics[which(bornuk_statistics$outsourcing_status == "Outsourced" & bornuk_statistics$BORNUK_labelled == "I was born in the UK"), "Percentage"],2)`% of outsourced workers are not born in the UK, compared to `r 100 - round(bornuk_statistics[which(bornuk_statistics$outsourcing_status == "Not outsourced" & bornuk_statistics$BORNUK_labelled == "I was born in the UK"), "Percentage"],2)`% of non-outsourced workers.[^17] This difference is statistically significant; **outsourced workers are `r round(coefs %>% filter(variable == "BORNUK_binaryNot born in UK") %>% pull(or),2)` times more likely to have been born outside the UK than non-outsourced workers.**[^18][^17]: [outputs/data/arrival_in_UK_stats.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/arrival_in_UK_stats.csv)[^18]: [outputs/data/bornuk_ostatus_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/bornuk_ostatus_inferential_tab.csv)::: callout-noteThis variable is worded a little strangely, e.g. responses are things like "within the last 10 years", "within the last 15 years". Given that respondents only give one answer to this question, I think we can assume that the responses are basically brackets. That is, someone responding "within the last 15 years" is basically saying "I came to the UK between 11 and 15 years ago".:::Looking at the figure below, compared to non-outsourced people, there is a larger proportion of outsourced workers for each arrival time apart from 'Within the last 30 years'.::: callout-noteNote that all figures here should be interpreted as e.g. "75% of outsourced workers were born in the UK; 87% of non-outsourced workers were born in the UK":::```{r}# bornuk_statistics %>%# ggplot(., aes(outsourcing_status, Percentage, fill = BORNUK_labelled)) +# geom_col(colour="black", position = "dodge") +# annotate("text", x = bornuk_statistics$outsourcing_status, y = 75, label = paste0("n=",bornuk_statistics$N)) +# coord_flip() +# scale_fill_manual(values=many_colours, name="Arrival in UK") +# theme_minimal() +# xlab("Outsourcing group") bornuk_statistics %>%ggplot(., aes(BORNUK_labelled, Percentage, fill =outsourcing_status)) +geom_col(colour="black", position ="dodge") +geom_text(aes(BORNUK_labelled, y =99, label =paste0("n = ",n)), position=position_dodge(width=1), hjust=1) +coord_flip() +scale_fill_manual(values=many_colours, name="Outsourcing status") +theme_minimal() +xlab("Arrival in UK") ```### Collapsed^[[/outputs/data/arrival_in_UK_collapsed_stats.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/arrival_in_UK_collapsed_stats.csv)]```{r}data <- data %>%mutate(BORNUK_collapsed = forcats::fct_collapse(BORNUK_labelled,"Born in UK"="I was born in the UK","Came to UK recently"=c("Within the last year"),"Came to UK not recently"=c("Within the last 3 years","Within the last 5 years","Within the last 10 years","Within the last 15 years","Within the last 20 years","Within the last 30 years","More than 30 years ago"),"Prefer not to say"=c("Prefer not to say") ) )bornuk_statistics <- data %>%group_by(outsourcing_status, BORNUK_collapsed) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(bornuk_statistics, file="../outputs/data/arrival_in_UK_collapsed_stats.csv")bornuk_statistics %>%ggplot(., aes(BORNUK_collapsed, Percentage, fill =outsourcing_status)) +geom_col(colour="black", position ="dodge") +geom_text(aes(BORNUK_collapsed, y =99, label =paste0("n = ",n)), position=position_dodge(width=1), hjust=1) +coord_flip() +scale_fill_manual(values=many_colours, name="Outsourcing status") +theme_minimal() +xlab("Arrival in UK") ``````{r}mod <-multinom(outsourcing_group ~ BORNUK_binary, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <-summary(mod)$coefficientsors <-exp(coefs)colnames(ors) <-paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1-pnorm(abs(z), 0, 1)) *2colnames(p) <-paste(colnames(p), "p", sep="_")p_2 <-apply(p, 2, function(x) ifelse(x <0.01, 1, NA))sig_ors <-exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs <-cbind(coefs, ors, p) %>%as_tibble()write_csv(coefs, file ="../outputs/data/bornuk_ogroup_inferential_tab.csv")# sig_ors``````{r o-group}bornuk_statistics_2 <- data %>% group_by(outsourcing_group, BORNUK_labelled) %>% summarise( n = n(), Frequency = sum(NatRepemployees) ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) )readr::write_csv(bornuk_statistics, file="../outputs/data/arrival_in_UK_stats_2.csv")```Exploring *types* of outsourced work indicates that the pattern observed above applies evenly to the different outsourcing groups.[^19] Compared to people born in the UK, people not born in the UK are:[^19]: [outputs/data/bornuk_ogroup_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/bornuk_ogroup_inferential_tab.csv) & [/outputs/data/arrival_in_UK_stats_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/arrival_in_UK_stats_2.csv)- `r round(sig_ors['Outsourced', 2],2)` times more likely to be outsourced than non-outsourced- `r round(sig_ors['Likely agency', 2],2)` times more likely to be likely agency than non-outsourced- `r round(sig_ors['High indicators', 2],2)` times more likely to be high indicators than non-outsourcedThe figure below indicates that the proportion of workers of each outsourcing group within each arrival time are broadly similar.```{r}bornuk_statistics_2 %>%ggplot(., aes(BORNUK_labelled, Percentage, fill =outsourcing_group)) +geom_col(colour="black", position ="dodge") +geom_text(aes(BORNUK_labelled, y = Percentage, label =paste0("n = ",n)), position=position_dodge(width=1), hjust=0, size =3) +coord_flip() +scale_fill_manual(values=many_colours, name="Outsourcing group") +theme_minimal() +xlab("Arrival in UK") +ylim(0,100)```### Collapsed^[[/outputs/data/arrival_in_UK_o-group_collapsed_stats.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/arrival_in_UK_o-group_collapsed_stats.csv)]```{r}bornuk_statistics <- data %>%group_by(outsourcing_group, BORNUK_collapsed) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(bornuk_statistics, file="../outputs/data/arrival_in_UK_o-group_collapsed_stats.csv")bornuk_statistics %>%ggplot(., aes(BORNUK_collapsed, Percentage, fill =outsourcing_group)) +geom_col(colour="black", position ="dodge") +geom_text(aes(BORNUK_collapsed, y =99, label =paste0("n = ",n)), position=position_dodge(width=1), hjust=1) +coord_flip() +scale_fill_manual(values=many_colours, name="Outsourcing group") +theme_minimal() +xlab("Arrival in UK") ```## Interaction: Ethnicity and arrival time```{r}base_mod <- mod <-glm(outsourcing_status ~ Ethnicity_collapsed + BORNUK_binary, data, weights = NatRepemployees, family ="quasibinomial")mod <-glm(outsourcing_status ~ Ethnicity_collapsed*BORNUK_binary, data, weights = NatRepemployees, family ="quasibinomial")# summary(mod)# check that interaction imporves the model over main effects - it doesanova(base_mod, mod, test ="F")coefs <-extract_glm_coefs(mod)``````{r}ems <-emmeans(mod, specs ="Ethnicity_collapsed", by ="BORNUK_binary")cons <-summary(contrast(ems, "pairwise",adjust="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )write_csv(cons, file ="../outputs/data/ethnicity_bornUK_binary_contrasts.csv")```Exploring the intersection of ethnicity and arrival time reveals some patterns whereby the likelihood of a person being outsourced is related to the combinations of ethnicity and whether they were born in the UK.[^20] The plot below shows that[^20]: [outputs/data/bornUK_binary_contrasts.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/bornUK_binary_contrasts.csv)- Among workers born in the UK, a Black worker is `r round(sig_cons %>% filter(contrast == "(White/White British) - (Black/African/Caribbean/Black British)") %>% pull(or),2)` times more likely to be outsourced than a White worker.- Among workers born in the UK, a Asian worker is `r round(sig_cons %>% filter(contrast == "(White/White British) - (Asian/Asian British)") %>% pull(or),2)` times more likely to be outsourced than a White worker.- Among workers born in the UK, a Other ethnicity worker is `r round(sig_cons %>% filter(contrast == "(White/White British) - Other ethnic group") %>% pull(or),2)` times more likely to be outsourced than a White other worker.```{r}sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("BORNUK_binary","Ethnicity_collapsed"), dodge=0.5) +coord_flip() +xlab("") +ylab("Likelihood of being outsourced") +theme_minimal()``````{r}ems_2 <-emmeans(mod, specs ="BORNUK_binary", by ="Ethnicity_collapsed")cons <-summary(contrast(ems_2, "pairwise",adjust="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )write_csv(cons, file ="../outputs/data/bornUK_binary_contrasts_2.csv")```Similarly, the plot below shows that[^21][^21]: [outputs/data/region_stats_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv)- Among White workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "White/White British") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Among Mixed workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Mixed/Multiple ethnic group") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Among Other workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Other ethnic group") %>% pull(or),2)` times as likely (i.e.,`r round(100 * (1 - (sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Other ethnic group") %>% pull(or))),0)`% less likely) to be outsourced than someone born in the UK.```{r}sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("Ethnicity_collapsed","BORNUK_binary"), dodge=0.5) +coord_flip() +xlab("") +ylab("Likelihood of being outsourced") +theme_minimal()```Put differently, being born in the UK is relevant in predicting outsourcing status only for White, Mixed, and Other ethnicities. For the remaining ethnicities, it doesn't matter whether you are born in the UK or not. And compared to a White person born in the UK, Black and Asian workers are more likely to be outsourced whether or not they were born in the UK.In summary, people born in the UK are more likely to be outsourced if they are Black, Asian, or 'Other', compared to White, and White and mixed ethnicities are more likely to be outsourced if they are not born in the UK, whereas 'other' ethnicities are less likely to be outsourced if they were not born in the uk.```{r output=FALSE}data <- data %>% mutate( BORNUK_collapsed = forcats::fct_collapse(BORNUK_labelled, "Born in UK" = "I was born in the UK", "Came to UK recently" = c("Within the last year"), "Came to UK not recently" = c("Within the last 3 years", "Within the last 5 years", "Within the last 10 years", "Within the last 15 years", "Within the last 20 years", "Within the last 30 years", "More than 30 years ago"), "Prefer not to say" = c("Prefer not to say") ) )mod <- glm(outsourcing_status ~ Ethnicity_collapsed*BORNUK_collapsed, data, family="quasibinomial", weight = NatRepemployees)#summary(mod)ems <- emmeans(mod, specs = "Ethnicity_collapsed", by = "BORNUK_collapsed")cons <- summary(contrast(ems, "pairwise", adjust = "tukey"))sig_cons <- cons %>% filter(p.value < .05) %>% mutate( or = 1 / exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )# sig_conswrite_csv(cons, file = "../outputs/data/bornUK_collapsed_contrasts.csv")```We next explore arrival time by collapsing responses to the arrival time question into fewer categories as below+-------------------------+------------------------------+| Collapsed level | Original level |+=========================+==============================+| Born in UK | - I was born in the UK |+-------------------------+------------------------------+| Came to UK recently | - Within the last year |+-------------------------+------------------------------+| Came to UK not recently | - Within the last 3 years || | || | - Within the last 5 years || | || | - Within the last 10 years || | || | - Within the last 15 years || | || | - Within the last 20 years || | || | - Within the last 30 years || | || | - More than 30 years ago |+-------------------------+------------------------------+| Prefer not to say | - Prefer not to say |+-------------------------+------------------------------+Exploring these categories[^22] confirms that[^22]: [outputs/data/region_stats_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_3.csv)- Among workers born in the UK, a Black worker is `r round(sig_cons %>% filter(contrast == "(White/White British) - (Black/African/Caribbean/Black British)") %>% pull(or),2)` times more likely to be outsourced than a White worker.- Among workers born in the UK, a Asian worker is `r round(sig_cons %>% filter(contrast == "(White/White British) - (Asian/Asian British)") %>% pull(or),2)` times more likely to be outsourced than a White worker.```{r}sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("BORNUK_collapsed","Ethnicity_collapsed"), dodge=0.5) +coord_flip() +xlab("") +ylab("Likelihood of being outsourced") +theme_minimal()``````{r}ems_2 <-emmeans(mod, specs ="BORNUK_collapsed", by ="Ethnicity_collapsed")cons <-summary(contrast(ems_2, "pairwise", adjust ="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )# sig_conswrite_csv(cons, file ="../outputs/data/bornUK_collapsed_contrasts_2.csv")```And[^23][^23]: [outputs/data/bornUK_collapsed_contrasts_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/bornUK_collapsed_contrasts_2.csv)- Among White workers,- Someone who came to the UK recently is `r round(sig_cons %>% filter(contrast == "Born in UK - Came to UK recently" & Ethnicity_collapsed == "White/White British") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Someone who came to the UK not recently is `r round(sig_cons %>% filter(contrast == "Born in UK - Came to UK not recently" & Ethnicity_collapsed == "White/White British") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Someone who preferred to not say when they arrived is `r round(sig_cons %>% filter(contrast == "Born in UK - Prefer not to say" & Ethnicity_collapsed == "White/White British") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Among Asian workers - Someone who came to the UK not recently is `r round(sig_cons %>% filter(contrast == "Born in UK - Came to UK not recently" & Ethnicity_collapsed == "Asian/Asian British") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK. - Someone who came to the UK not recently is `r round(sig_cons %>% filter(contrast == "Came to UK recently - Came to UK not recently" & Ethnicity_collapsed == "Asian/Asian British") %>% pull(or),2)` times more likely to be outsourced than someone who came to the UK recently- Among Other workers - Someone who came to the UK not recently is `r round(sig_cons %>% filter(contrast == "Born in UK - Came to UK recently" & Ethnicity_collapsed == "Other ethnic group") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.```{r}sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("Ethnicity_collapsed","BORNUK_collapsed"), dodge=0.5) +coord_flip()```In summary,- White outsourced workers are more likely to have not been born in the UK- Asian/Asian British and Other outsourced workers are more likely to have been in the UK a longer time (10 years plus)- UK-born Black and Asian workers are more likely to be outsourced than White UK-born workers, but no more or less likely to be outsourced than non-UK born Black and Asian workers (revise this)# Characteristics of outsourced work## Major occupations```{r MGC}data <- data %>% mutate( Majorgroupcode_labelled = na_if(Majorgroupcode_labelled, "NA") ) %>% mutate( Majorgroupcode_labelled = factor(stringr::str_to_sentence(Majorgroupcode_labelled)) )mgc_summary <- data %>% group_by(Majorgroupcode_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), # avg_income = mean(income_annual, na.rm=T), # wtd_avg_income = weighted.mean(income_annual, w = NatRepemployees, na.rm=T) ) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum) )readr::write_csv(mgc_summary, "../outputs/data/majorgroupcode_summary_o-status.csv")``````{r}plot_data <- mgc_summary# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>%ungroup() %>%filter(outsourcing_status =='Not outsourced') %>%mutate(Majorgroupcode_labelled = forcats::fct_reorder(Majorgroupcode_labelled, perc, .desc =TRUE))# Apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(Majorgroupcode_labelled =factor(Majorgroupcode_labelled, levels =levels(not_outsourced_levels$Majorgroupcode_labelled)), )annotation_df <- plot_data %>%filter(outsourcing_status =="Not outsourced") %>%drop_na(Majorgroupcode_labelled) %>%select(Majorgroupcode_labelled, N) %>%mutate(ypos =80 )plot_data %>%drop_na(Majorgroupcode_labelled) %>%ggplot(aes(Majorgroupcode_labelled, perc, fill = outsourcing_status)) +geom_col() +coord_flip() +geom_text(inherit.aes=F,data=annotation_df, aes(x=Majorgroupcode_labelled, y=ypos, label =paste0("N = ", N)), hjust=1, nudge_y =15) +scale_fill_manual(values=many_colours, name ="Outsourcing status") +ylab("Percentage") +xlab("Major group") ``````{r}mgc_summary <- data %>%group_by(Majorgroupcode_labelled, outsourcing_group) %>%summarise(n =n(),Frequency =sum(NatRepemployees),# avg_income = mean(income_annual, na.rm=T),# wtd_avg_income = weighted.mean(income_annual, w = NatRepemployees, na.rm=T) ) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum) )readr::write_csv(mgc_summary, "../outputs/data/majorgroupcode_summary_o-group.csv")```### Variations in pay```{r mgc-bubble-status}mgc_summary_pay <- data %>% filter(income_drop_all == 0) %>% group_by(Majorgroupcode_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_annual_all, na.rm=T), wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(Majorgroupcode_labelled) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum) )write_csv(mgc_summary_pay, file="../outputs/data/mgc_summary_pay.csv")plot_data <- mgc_summary_pay %>% drop_na(Majorgroupcode_labelled) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(Majorgroupcode_labelled = forcats::fct_reorder(Majorgroupcode_labelled, perc, .desc = TRUE))outsourced <- plot_data %>% filter(outsourcing_status == 'Outsourced') %>% mutate( rank = rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( Majorgroupcode_labelled = factor(Majorgroupcode_labelled, levels = levels(not_outsourced_levels$Majorgroupcode_labelled)), )annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% select(Majorgroupcode_labelled, n) %>% group_by(Majorgroupcode_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income, Majorgroupcode_labelled, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=Majorgroupcode_labelled, label = paste0("N = ", N)), hjust=1) + geom_text_repel(inherit.aes = F, aes(wtd_avg_income, Majorgroupcode_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average income") + ylab("Major group code") + labs(caption = "Size of bubble represents the size of the respective workforce")``````{r mgc-bubble-status-2}mgc_summary_paysplit <- data %>% filter(income_drop_all == 0) %>% group_by(Majorgroupcode_labelled, income_group, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_annual_all, na.rm=T), wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(Majorgroupcode_labelled, income_group) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum) )write_csv(mgc_summary_paysplit, file="../outputs/data/mgc_summary_paysplit.csv")plot_data <- mgc_summary_paysplit %>% drop_na(Majorgroupcode_labelled) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_short# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(Majorgroupcode_labelled = forcats::fct_reorder(Majorgroupcode_labelled, perc, .desc = TRUE))# # outsourced <- plot_data %>%# filter(outsourcing_status == 'Outsourced') %>%# mutate(# rank = rank(desc(perc))# )# # Here use the previous ordering so this plot can be compared with previous.# # Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( Majorgroupcode_labelled = factor(Majorgroupcode_labelled, levels = levels(not_outsourced_levels$Majorgroupcode_labelled)), )annotation_df <- plot_data %>% # filter(outsourcing_status == "Not outsourced") %>% drop_na(income_group) %>% select(Majorgroupcode_labelled, n) %>% group_by(Majorgroupcode_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% drop_na(income_group) %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income, Majorgroupcode_labelled, size = perc, colour = outsourcing_status, shape = income_group)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) + scale_colour_manual(values=colours) + geom_text_repel(inherit.aes = F, aes(wtd_avg_income, Majorgroupcode_labelled, colour = outsourcing_status, shape = income_group, label=paste0("n=",n)), size=3) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=Majorgroupcode_labelled, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average income") + ylab("Major group code") + labs(caption = "Size of bubble represents the size of the respective workforce")```For Elementary occupations, there is a clear divergence evident in the pattern; for high income workers, **being outsourced increases average income**, whereas for low income workers, **being outsourced decreases average income**. For most other groups, being outsourced is associated with a lower income, regardless of income group.```{r mgc-o-group}plot_data <- mgc_summary# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% ungroup() %>% filter(outsourcing_group == 'Not outsourced') %>% mutate(Majorgroupcode_labelled = forcats::fct_reorder(Majorgroupcode_labelled, perc, .desc = TRUE))# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( Majorgroupcode_labelled = factor(Majorgroupcode_labelled, levels = levels(not_outsourced_levels$Majorgroupcode_labelled)), )annotation_df <- plot_data %>% filter(outsourcing_group == "Not outsourced") %>% drop_na(Majorgroupcode_labelled) %>% select(Majorgroupcode_labelled, N) %>% mutate( ypos = 80 )plot_data %>% drop_na(Majorgroupcode_labelled) %>% ggplot(aes(Majorgroupcode_labelled, perc, fill = outsourcing_group)) + geom_col() + coord_flip() + geom_text(inherit.aes=F,data=annotation_df, aes(x=Majorgroupcode_labelled, y=ypos, label = paste0("N = ", N)), hjust=1, nudge_y = 15) + scale_fill_manual(values=many_colours, name = "Outsourcing group") + ylab("Percentage") + xlab("Major group") + theme_minimal()```### Variations in pay```{r mgc-bubble-group}mgc_summary_pay <- data %>% filter(income_drop_all == 0) %>% group_by(Majorgroupcode_labelled, outsourcing_group) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_annual_all, na.rm=T), wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(Majorgroupcode_labelled) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum) )write_csv(mgc_summary_pay, file="../outputs/data/mgc_summary_pay_group.csv")plot_data <- mgc_summary_pay %>% drop_na(Majorgroupcode_labelled) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_group == 'Not outsourced') %>% mutate(Majorgroupcode_labelled = forcats::fct_reorder(Majorgroupcode_labelled, perc, .desc = TRUE))outsourced <- plot_data %>% filter(outsourcing_group == 'Outsourced') %>% mutate( rank = rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( Majorgroupcode_labelled = factor(Majorgroupcode_labelled, levels = levels(not_outsourced_levels$Majorgroupcode_labelled)), )annotation_df <- plot_data %>% # filter(outsourcing_status == "Not outsourced") %>% select(Majorgroupcode_labelled, n) %>% group_by(Majorgroupcode_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income, Majorgroupcode_labelled, size = perc, colour = outsourcing_group)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=Majorgroupcode_labelled, label = paste0("N = ", N)), hjust=1) + geom_text_repel(inherit.aes = F, aes(wtd_avg_income, Majorgroupcode_labelled, colour = outsourcing_group, label=paste0("n=",n)), size=3) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average income") + ylab("Major group code") + labs(caption = "Size of bubble represents the size of the respective workforce")``````{r mgc-bubble-group-2}mgc_summary_paysplit <- data %>% filter(income_drop_all == 0) %>% group_by(Majorgroupcode_labelled, income_group, outsourcing_group) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_annual_all, na.rm=T), wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(Majorgroupcode_labelled, income_group) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum) )write_csv(mgc_summary_paysplit, file="../outputs/data/mgc_summary_paysplit.csv")plot_data <- mgc_summary_paysplit %>% drop_na(Majorgroupcode_labelled) %>% droplevels() %>% ungroup()# # Filter for 'outsourced' level and reorder SectorName_short# not_outsourced_levels <- plot_data %>%# filter(outsourcing_group == 'Not outsourced') %>%# mutate(Majorgroupcode_labelled = forcats::fct_reorder(Majorgroupcode_labelled, perc, .desc = TRUE))# # outsourced <- plot_data %>%# filter(outsourcing_group == 'Outsourced') %>%# mutate(# rank = rank(desc(perc))# )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( Majorgroupcode_labelled = factor(Majorgroupcode_labelled, levels = levels(not_outsourced_levels$Majorgroupcode_labelled)), )annotation_df <- plot_data %>% # filter(outsourcing_status == "Not outsourced") %>% drop_na(income_group) %>% select(Majorgroupcode_labelled, n) %>% group_by(Majorgroupcode_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% drop_na(income_group) %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income, Majorgroupcode_labelled, size = perc, colour = outsourcing_group, shape = income_group)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank(), legend.justification = "right")+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) + scale_colour_manual(values=colours) + geom_text_repel(inherit.aes = F, aes(wtd_avg_income, Majorgroupcode_labelled, colour = outsourcing_group, shape = income_group, label=paste0("n=",n)), size=2) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=Majorgroupcode_labelled, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average income") + ylab("Major group code") + labs(caption = "Size of bubble represents the size of the respective workforce")```## Unit occupations```{r}unit_occ_summary <- data %>%filter(outsourcing_status =="Outsourced") %>%group_by(UnitOccupation_labelled) %>%summarise(n =n() ) %>%mutate(UnitOcc_short = UnitOccupation_labelled ) %>%# make the sector names more readableseparate_wider_delim(UnitOcc_short, names =c("UnitOcc_short", "UnitOcc_short_detail"), delim=", ",too_few ="align_start",too_many ="merge") %>%mutate(UnitOcc_short = forcats::fct_reorder(UnitOcc_short, n, .desc=FALSE),perc =100* (n /sum(n)) ) %>%arrange(perc) %>%mutate(cum_perc =100-cumsum(perc),rank =rank(-perc, ties.method ="first") ) %>%arrange(desc(perc))write_csv(unit_occ_summary, file="../outputs/data/unit_occ_summary.csv")```Examining what unit occupations outsourced workers can be found in reveals that outsourced workers tend to be concentrated in a specific cluster of occupations.[^24]`r round(unit_occ_summary %>% filter(rank == 10) %>% pull(cum_perc),0)`% of outsourced workers are located in the top 10 most common unit occupations. The top 15 unit occupations capture over 50% of the outsourced workforce, and `r round(unit_occ_summary %>% filter(rank == 30) %>%pull(cum_perc),0)`% of the outsourced workforce are captured in 30 unit occupations (out of a total of `r unit_occ_summary %>% summarise(max(rank)) %>% pull()`). These thresholds are shown in the plot below where the blue lines intersect the red curve.[^24]: [outputs/data/unit_occ_summary.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/unit_occ_summary.csv)The top 10 unit occupations for outsourced workers are:- `r unit_occ_summary %>% filter(rank == 1) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 2) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 3) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 4) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 5) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 6) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 7) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 8) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 9) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 10) %>% pull(UnitOccupation_labelled)````{r}#| fig-height: 10r10 <- unit_occ_summary %>%filter(rank ==10) %>%pull(UnitOcc_short)r15 <- unit_occ_summary %>%filter(rank ==15) %>%pull(UnitOcc_short)r30 <- unit_occ_summary %>%filter(rank ==30) %>%pull(UnitOcc_short)unit_occ_summary %>%ggplot(aes(n, UnitOcc_short)) +geom_col() +geom_line(aes(cum_perc, UnitOcc_short, group=1), colour="red") +labs(caption ="Bars represent number of outsourced workers.\nRed line indicates cumulative percentage of all outsourced workers") +geom_hline(yintercept = r10, colour ="blue") +geom_hline(yintercept = r15, colour ="blue") +geom_hline(yintercept = r30, colour ="blue") +theme_minimal() +xlab("Number / Cumulative percentage") +ylab("Unit Occupation") +scale_x_continuous(breaks =seq(0,max(unit_occ_summary$n),10)) +theme(# axis.text.y =element_text(size = 4) )``````{r income-group-1}# get the list of occupationsoccs <- unit_occ_summary %>% slice_head(n=10) %>% mutate(UnitOccupation_labelled = as.character(UnitOccupation_labelled) ) %>% pull(UnitOccupation_labelled)income_group_summary <- income_data %>% filter(!is.na(income_group)) %>% filter(outsourcing_status == "Outsourced") %>% filter(UnitOccupation_labelled %in% occs) %>% group_by(UnitOccupation_labelled, income_group) %>% summarise( n = n(), Frequency = sum(NatRepemployees) ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum), UnitOccupation_labelled = factor(as.character(UnitOccupation_labelled)) ) %>% ungroup()write_csv(income_group_summary, "../outputs/data/unit_occ_income_group.csv")most_low_paid <- income_group_summary %>% filter(income_group=="Low") %>% arrange(desc(Percentage)) %>% slice_head(n=5) %>% mutate( UnitOccupation_labelled = as.character(UnitOccupation_labelled), Percentage = round(Percentage, 2) )```These occupations differ in the extent to which outsourced workers are low paid.[^25] The 5 occupations with the highest proportion of low paid outsourced workers are:[^25]: [outputs/data/unit_occ_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/unit_occ_income_group.csv)1. `r most_low_paid[1,1] %>% pull()`: `r most_low_paid[1, "Percentage"] %>% pull()`%2. `r most_low_paid[2,1] %>% pull()`: `r most_low_paid[2, "Percentage"] %>% pull()`%3. `r most_low_paid[3,1] %>% pull()`: `r most_low_paid[3, "Percentage"] %>% pull()`%4. `r most_low_paid[4,1] %>% pull()`: `r most_low_paid[4, "Percentage"] %>% pull()`%5. `r most_low_paid[5,1] %>% pull()`: `r most_low_paid[5, "Percentage"] %>% pull()`%The plot below visualises this.```{r income-group-2}levels <- income_group_summary %>% filter(income_group == "Low") %>% arrange(Percentage) %>% pull(UnitOccupation_labelled)annotation_df <- income_group_summary %>% filter(income_group == "Low") %>% select(UnitOccupation_labelled, N) %>% mutate( ypos = 110 )income_group_summary %>% mutate( UnitOccupation_labelled = factor(UnitOccupation_labelled, levels=levels) ) %>% ggplot(aes(Percentage,UnitOccupation_labelled, fill = income_group)) + geom_col(position="dodge") + geom_text(inherit.aes=F, data = annotation_df, aes(y=UnitOccupation_labelled, x=ypos, label = paste0("N=",N)), hjust=1, nudge_x = 2) + scale_x_continuous(breaks=seq(0,100,10)) + theme_minimal() + scale_fill_manual(name = "Income group", values = better_colours) + ylab("Unit occupation") + ggtitle("Top 10 occupations by income group") + labs(caption="Includes outsourced workers only")```